Restless Bandits with Constrained Arms: Applications in Social and Information Networks

نویسندگان

  • Varun Mehta
  • Rahul Meshram
  • Kesav Kaza
  • S. N. Merchant
چکیده

We study a problem of information gathering in a social network with dynamically available sources and time varying quality of information. We formulate this problem as a restless multi-armed bandit (RMAB). In this problem, information quality of a source corresponds to the state of an arm in RMAB. The decision making agent does not know the quality of information from sources a priori. But the agent maintains a belief about the quality of information from each source. This is a problem of RMAB with partially observable states. The objective of the agent is to gather relevant information efficiently from sources by contacting them. We formulate this as a infinite horizon discounted reward problem, where reward depends on quality of information. We study Whittle’s index policy which determines the sequence of play of arms that maximizes long term cumulative reward. We illustrate the performance of index policy, myopic policy and compare with uniform random policy through numerical simulation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lazy Restless Bandits for Decision Making with Limited Observation Capability: Applications in Wireless Networks

In this work we formulate the problem of restless multi-armed bandits with cumulative feedback and partially observable states. We call these bandits as lazy restless bandits (LRB) as they are slow in action and allow multiple system state transitions during every decision interval. Rewards for each action are state dependent. The states of arms are hidden from the decision maker. The goal of t...

متن کامل

Multi-armed Bandits with Constrained Arms and Hidden States

The problem of rested and restless multi-armed bandits with constrained availability of arms is considered. The states of arms evolve in Markovian manner and the exact states are hidden from the decision maker. First, some structural results on value functions are claimed. Following these results, the optimal policy turns out to be a threshold policy. Further, indexability of rested bandits is ...

متن کامل

Leveraging Side Observations in Stochastic Bandits

This paper considers stochastic bandits with side observations, a model that accounts for both the exploration/exploitation dilemma and relationships between arms. In this setting, after pulling an arm i, the decision maker also observes the rewards for some other actions related to i. We will see that this model is suited to content recommendation in social networks, where users’ reactions may...

متن کامل

Interdependent Security Game Design over Constrained Linear Influence Networks

In today's highly interconnected networks, security of the entities are often interdependent. This means security decisions of the agents are not only influenced by their own costs and constraints, but also are affected by their neighbors’ decisions. Game theory provides a rich set of tools to analyze such influence networks. In the game model, players try to maximize their utilities through se...

متن کامل

Marginal productivity index policies for scheduling restless bandits with switching penalties

We address the dynamic scheduling problem for discrete-state restless bandits, where sequence-independent setup penalties (costs or delays) are incurred when starting work on a project. We reformulate such problems as restless bandit problems without setup penalties, and then deploy the theory of marginal productivity indices (MPIs) and partial conservation laws (PCLs) we have introduced and de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1801.03634  شماره 

صفحات  -

تاریخ انتشار 2018